Global Pre-ordering for Improving Sublanguage Translation

نویسندگان

  • Masaru Fuji
  • Masao Utiyama
  • Eiichiro Sumita
  • Yuji Matsumoto
چکیده

When translating formal documents, capturing the sentence structure specific to the sublanguage is extremely necessary to obtain high-quality translations. This paper proposes a novel global reordering method with particular focus on long-distance reordering for capturing the global sentence structure of a sublanguage. The proposed method learns global reordering models from a non-annotated parallel corpus and works in conjunction with conventional syntactic reordering. Experimental results on the patent abstract sublanguage show substantial gains of more than 25 points in the RIBES metric and comparable BLEU scores both for Japanese-to-English and English-to-Japanese translations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Patent Claim Translation based on Sublanguage-specific Sentence Structure

Patent claim sentences, despite their legal importance in patent documents, still pose difficulties for state-of-the-art statistical machine translation (SMT) systems owing to their extreme lengths and their special sentence structure. This paper describes a method for improving the translation quality of claim sentences, by taking into account the features specific to the claim sublanguage. Ou...

متن کامل

of MT Summit XV

Patent claim sentences, despite their legal importance in patent documents, still pose difficul-ties for state-of-the-art statistical machine translation (SMT) systems owing to their extremelengths and their special sentence structure. This paper describes a method for improving thetranslation quality of claim sentences, by taking into account the features specific to the claim<...

متن کامل

MTIL17: English to Indian Langauge Statistical Machine Translation

English to Indian language machine translation poses the challenge of structural and morphological divergence. This paper describes English to Indian language statistical machine translation using pre-ordering and suffix separation. The pre-ordering uses rules to transfer the structure of the source sentences prior to training and translation. This syntactic restructuring helps statistical mach...

متن کامل

Controlled Languages for Machine Translation: State of the Art

A controlled language is a subset of a natural language with artificially restricted vocabulary, grammar, and style. Texts written in a controlled language are usually less complex and less ambiguous than those written in an uncontrolled language. The use of a controlled language therefore produces better results in machine translation. On the other hand, a controlled language reduces the power...

متن کامل

A joint inference of deep case analysis and zero subject generation for Japanese-to-English statistical machine translation

We present a simple joint inference of deep case analysis and zero subject generation for the pre-ordering in Japanese-toEnglish machine translation. The detection of subjects and objects from Japanese sentences is more difficult than that from English, while it is the key process to generate correct English word orders. In addition, subjects are often omitted in Japanese when they are inferabl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016